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Abstract 

In  this  work,  we  explore  various  ideas  and  approaches  to  deal  with  the  inherent  uncertainty  and  image 
noise  in  motion  analysis,  and  develop  a  low-complexity,  accurate  and  reliable  scheme  to  estimate  the 
motion  fields  from  UAV  navigation  videos.  The  motion  field  information  allows  us  to  accurately  estimate 
ego-motion  parameters  of  the  UAV  and  refine  (or  correct)  the  motion  measurements  from  other  sensors. 
Based  on  the  motion  field  information,  we  also  compute  the  range  map  for  objects  in  the  scene.  Once  we 
have  accurate  knowledge  about  the  vehicle  motion  and  its  navigation  environment  (range  map),  control 
and  guidence  laws  can  be  designed  to  navigate  the  UAV  between  way  points  and  avoid  obstacles. 


I  Introduction 

In  vision-based  UAV  navigation  control,  the  video  data  captured  from  the  on-board  camera  has  two 
major  purposes:  1)  It  will  be  used  to  determine  the  ranges  of  scene  objects.  Based  on  the  range  information, 
a  guidance  law  can  be  designed  for  the  UAV  such  that  it  is  able  to  avoid  obstacles  during  its  navigation 
between  waypoints.  2)  The  angular  velocity  estimated  from  the  gyroscope  often  has  a  significant  amount 
of  noise,  especially  in  the  yaw  angle  or  when  the  wind  effect  is  strong.  The  vision  information,  as  another 
source  of  sensor  information,  is  able  to  help  us  refine  the  estimation  of  angular  velocity  for  flight  control 
purposes. 

We  assume  that  the  UAV  has  a  fairly  good  knowledge  (from  GPS  data)  about  its  linear  velocity  in  the 
inertial  frame.  The  gyroscope  on  the  UAV  is  able  to  give  us  a  rough  estimation  of  the  vehicle’s  orientation. 
At  this  moment,  we  assume  that  the  camera  is  located  at  the  center  of  gravity  of  the  UAV,  and  the  camera 
orientation  is  the  same  as  the  UAV’s  body  orientation,  as  illustrated  in  Fig.  1.  We  map  the  linear  velocity 
from  the  inertial  frame  into  the  camera  coordinate  system  and  denoted  it  by  14  =  ( 14  i ,  142, 14:; )  • 

Fig.  2  shows  the  block  diagram  of  the  proposed  vision-based  motion  analysis  and  obstacle  avoidance 
system.  We  denote  the  video  frame  (also  called  an  image)  captured  from  the  camera  by  It(x,  y),  0  <  x  <  IV, 
0  <  y  <  H,  where  t  is  the  time  index,  IV  and  H  are  the  width  and  height  of  the  frame.  Using  two 
consecutive  video  frames,  It- i(x,  y )  and  y),  we  are  going  to  determine  the  angular  velocity  U  of  the 
camera  and  the  range  map  dt(x,y).  Here,  dt(x,y)  is  the  distance  between  camera  and  the  scene  object 
represented  by  pixel  (x,y).  To  determine  Q  and  dt(x,y),  we  need  an  accurate  estimation  of  motion  field 
{Mt(x,  y)|0  <  x  <  IV, 0  <  y  <  H},  where  M t(x,y)  represents  the  motion  vector  (or  velocity  vector) 
of  pixel  (x,  y).  The  major  challenge  in  computer-based  motion  field  estimation  is  uncertainty.  To  manage 
the  uncertainty  and  accurately  estimate  the  camera  motion,  we  equally  partition  the  image  It(x,y)  into 
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Figure  1 :  Coordinate  systems  in  UAV  navigation. 


blocks  and  classify  them  into  two  groups:  structural  (e.g.  buildings  edges,  corners,  tree  top,  etc)  and  non- 
structural  blocks  (e.g.  texture).  The  structural  blocks  will  have  distinctive  features  for  accurate  motion 
estimation.  To  obtain  a  reliable  estimation  of  camera  motion,  we  propose  a  new  motion  estimation  scheme 
called  reliability-based  motion  held  estimation  (rMFE),  which  will  be  explained  in  Section  II.  Based  on  the 
motion  information  of  the  structural  blocks,  we  determine  the  angular  motion  of  camera.  Once  we  have  a 
complete  knowledge  about  the  camera  motion,  we  can  roughly  predict  the  actual  motion  of  each  object  in 
the  scene.  In  other  words,  for  each  non- structural  block,  we  can  determine  a  small  image  region  where  its 
motion  vector  should  lie  within.  Inside  this  small  region,  we  find  the  best  match  for  the  block  and  determine 
its  motion  vector.  Once  the  motion  vector  is  obtained  for  each  block  (both  structural  and  non-structural 
blocks),  we  can  then  compute  the  range  for  each  block  and  obtain  the  range  map  dt(x,y).  Based  on  the 
range  map,  a  guidance  law  can  be  designed  to  control  the  UAV  such  that  is  able  to  avoid  the  obstacle  and 
maintain  a  steady  flight  between  waypoints. 


Figure  2:  Block  diagram  for  vision-based  ego-motion  analysis  and  obstacle  avoidance. 

II  Motion  Field  Analysis 

Our  objective  is  to  develop  a  low-complexity  and  robust  motion  analysis  scheme  to  accurately  estimate 
the  camera  motion  and  range  map  for  UAV  navigation.  Several  approaches  have  been  developed  in  the  liter¬ 
ature,  including  feature  tracking  and  structure  from  motion.  The  major  problems  with  the  existing  methods 
are:  1)  High  computational  complexity.  Feature  tracking  and  structure  analysis  often  involve  computation¬ 
intensive  computer  vision  tasks,  such  as  feature  extraction  and  geometric  modeling.  Computation-intensive 
motion  analysis  is  not  affordable  on  micro-UAV  platforms  which  have  limited  on-board  computation  capa- 
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bility.  2)  Constrained  navigation  environments.  Many  existing  algorithms  arc  developed  based  on  assump¬ 
tions  on  the  image  content.  For  example,  some  algorithms  assume  that  the  image  has  buildings  in  it  and 
they  try  to  look  for  building  edges  or  corners  for  tracking  and  geometric  modeling.  This  type  of  algorithms 
work  efficiently  in  constrained  environments  with  desired  image  features,  but  may  fail  in  other  navigation 
environments. 

II- A  Uncertainty  in  Motion  Field  Analysis 

During  UAV  navigation,  the  dominant  change  in  image  data  from  time  t  —  1  to  t  is  caused  by  camera 
motion,  such  as  location  change,  camera  rotation,  and  camera  zoom  in/out.  In  addition  to  this  dominant 
motion,  there  could  be  moving  objects,  such  as  vehicles  or  persons  on  the  ground,  which  cause  local  changes 
in  the  image  data.  Another  major  source  that  cause  image  content  change  is  noise,  including  video  capture 
noise,  changes  in  lighting  condition,  etc.  A  typical  approach  to  motion  field  estimation  is  to  partition  the 
image  It  into  blocks.  For  each  block  B,  we  find  its  best  match  A  in  the  previous  frame  (also  called  reference 
frame)  It- To  find  the  best  match,  we  need  to  define  a  distance  (or  similarity)  metric  to  measure  how  close 
block  A  is  to  B.  This  distance  metric  is  denoted  by  d{ A,  B).  With  this  metric,  we  can  then  search  in 
the  previous  frame  within  a  neighborhood  of  B  to  find  the  best-match  block  which  minimizes  the  distance 
metric.  In  block-based  motion  analysis,  especially  for  images  with  noise  and  little  distinctive  features,  it  is 
sometimes  very  hard  to  find  out  the  exact  motion  based  on  only  local  information,  even  with  human  eyes. 
For  example,  as  shown  in  Fig.  3,  the  blocks  on  building  edges,  regions  with  flat  colors,  or  texture  areas, 
such  as  grassland,  are  able  to  find  a  number  of  “best  matches”  within  its  neighborhood.  In  this  case,  there 
is  a  lot  of  uncertainty  and  ambiguity  in  determining  the  exact  motion  of  the  block.  If  wrong  motion  vectors 
are  selected,  the  camera  motion  parameters  and  range  map  estimation  will  be  inaccurate. 

We  propose  three  major  ideas  to  deal  with  the  uncertainty  in  motion  field  analysis.  1)  We  classify  the 
image  blocks  into  two  groups:  structural  and  non-structural  blocks.  The  structural  blocks  with  distinctive 
features  will  have  reliable  motion  estimation.  2)  We  allow  the  motion  estimation  to  find  multiple  “best” 
motion  vectors,  instead  of  one  single  best  motion  vector  for  each  block  as  in  the  conventional  motion  esti¬ 
mation.  We  also  define  a  reliability  metric  to  measure  how  reliable  is  the  motion  estimation  in  each  block. 
3)  We  use  the  reliable  motion  information  from  the  structural  blocks  to  estimate  the  camera  motion.  Ac¬ 
curate  knowledge  about  the  camera  motion  will  help  us  to  reduce  and  uncertainty  in  motion  analysis  for 
non-structural  blocks  and  find  the  true  motion  for  them. 


Figure  3:  Sample  video  frames  in  UAV  navigation. 

II-B  Classification  of  Structural  and  Non-Structural  Blocks 

Structural  blocks,  such  as  buildings  edges,  corners,  road  lines,  and  tree  tops,  have  distinctive  features 
for  accurate  motion  estimation.  In  the  frequency  domain,  a  structural  block  has  a  major  portion  of  its 
total  energy  in  the  low-to-medium  frequency  bands.  The  proposed  classification  algorithm  has  three  major 
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steps:  Step  1)  We  apply  discrete  cosine  transform  (DCT)  to  the  block  and  order  the  transform  coefficients  Xj, 
0  <  i  <  S  from  low  to  high  frequencies.  Step  2)  The  first  20%  of  the  coefficients,  except  the  DC  coefficient, 
contains  a  significant  amount  of  structural  information,  such  as  edges,  corners,  and  other  patterns.  For  each 
block,  we  define  an  energy  ratio  for  the  structural  coefficients: 

r  =  £  *?/£*?.  (1) 

i= 1  i= 1 

Step  3)  We  select  the  portion  of  blocks,  for  example  top  15%,  which  have  the  highest  structural  energy 
ratios,  as  the  structural  blocks,  and  with  the  rest  being  classified  as  non- structural  ones.  Fig.  4  shows  the 
classification  results.  The  structural  blocks  are  highlighted  with  white  boxes. 


Figure  4:  Classification  into  structural  (highlighted  with  white  boxes)  and  non- structural  blocks. 

II-C  Distance  Metric  for  Motion  Search 

In  motion  field  analysis  for  UAV  navigation,  a  desired  distance  (or  similarity)  measure  should  be  invari¬ 
ant  to  camera  motion,  local  object  motion,  and  robust  to  image  noise.  To  measure  the  similarity  between 
two  blocks  A  and  B,  we  need  to  take  two  steps:  1)  First,  we  extract  a  set  of  features  from  each  block.  2) 
Second,  we  compute  the  distance  between  these  two  sets  of  features.  In  conventional  motion  estimation,  the 
pixel  value  is  often  used  as  the  feature,  and  the  distance  measure  is  simply  given  by 

do(A,B)  —  ^  ^  |  dij  bij  |,  (2) 

ij 


which  is  the  SAD  (sum  of  absolute  difference)  measure  used  in  many  video  compression  systems.  The  SAD 
metric  is  invariant  only  under  translational  motion. 

To  handle  other  types  of  camera  motions,  such  as  rotation,  zoom  in/out,  and  perspective  changes,  we 
introduce  one  additional  feature,  called  intensity  profile.  The  intensity  profile  aims  to  characterize  the  in¬ 
tensity  distribution  in  an  image  region.  Let  0B  =  (x/j.  yB)  be  the  center  position  (pixel)  of  block  B.  Let 
C(0B,r )  be  a  circle  centered  at  Ob  with  a  radius  r,  as  illustrated  in  Fig.  5.  The  average  intensity  on  this 
circle  is  given  by 


m(0B,r)  =  1 - -l  It(x,y)dxdy,  0  <  r  <  R,  (3) 

\C{0B,r)\  JC(oB,r) 

where  R  is  the  maximum  radius  to  search.  For  example,  we  can  set  R  to  be  the  block  width.  The  function 
m(0B,r)  is  called  the  intensity  profile  for  pixel  0B  or  block  B.  Similarly,  we  can  define  the  intensity 
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profile  for  the  center  pixel  of  block  A 

m{0A,r)  = - -l  It-i(x,y)dxdy.  (4) 

\C{pA,r)\  J c (O A, r) 

We  can  see  that  if  pixel  Oa  in  frame  It-\  moves  to  Ob  in  frame  It,  the  intensity  profiles  m(Og,r) 
and  m(OA,r)  will  be  the  same  even  with  camera  rotation.  However,  with  camera  zoom,  rn(0[->.  r)  and 
m(OA,r )  will  be  different.  For  example,  if  the  camera  zooms  out,  m(OA,  r)  will  match  the  first  segment 
of  m(OB,r )  after  being  scaled  horizontally  (either  compressed  or  stretched),  as  illustrated  in  Fig.  5.  Based 
on  this  observation,  we  can  define  another  distance  measure  as  follows 

di(A,B)  =  min  max  \m(OA,  A  •  r)  —  17i(Ob,  r)|,  (5) 

1— <5<A<l+<5  0<r<R/\ 

where  A  is  the  scaling  factor,  and  [1  —  S,  1  +  4]  is  the  search  range  for  A.  It  can  be  seen  that  the  distance  (or 
similarity)  metric  di(A,  B)  is  invariant  under  camera  rotation  and  zoom.  The  distance  metrics  c?o(A,  B) 
and  di(A,  B)  captures  different  information  about  the  similarity  between  blocks.  We  form  a  comprehensive 
distance  metric  for  motion  search  as  follows 

d(  A,  B)  =  w  ■  do  (A,  B)  +  (1  —  w)  •  d\  (A,  B),  (6) 

where  w  is  the  weighting  factor  which  can  be  adjusted  according  to  the  amount  of  camera  motion  in  rotation 
and  zoom.  For  example,  if  the  angular  velocity  of  the  camera  is  relatively  small,  we  can  choose  a  smaller 
value  of  w.  Once  the  distance  metric  is  established,  we  can  then  search  the  neighborhood  of  block  B  in  the 
previous  frame  Jt_i,  denoted  by  JV’(B),  to  find  the  best  block  A*  which  has  the  minimum  distance  to  B, 

A*  =  arg  min  d(A,  B)  (7) 

AeW(B) 

The  difference  vector  between  the  center  positions  of  blocks  A*  and  B  is  the  motion  vector. 

Block  A  Block  B 


Figure  5:  Definition  of  intensity  profile  for  blocks. 

II-D  Reliability-Based  Motion  Field  Estimation 

As  discussed  in  Section  I,  the  image  is  equally  partitioned  into  blocks,  {B"jl  <  n  <  N}.  Based  on 
the  frequency-domain  information,  we  classify  these  blocks  into  structural  and  non-structural  blocks,  as 
discussed  in  Section  II-B.  We  denote  the  structural  blocks  by  {Bm|l  <  in  <  M},  M  <  N.  As  discussed  in 
Section  II-A,  because  of  the  inherent  uncertainty  and  ambiguity  in  motion  analysis,  each  block  may  be  able 
to  find  multiple  “best”  matches  according  the  distance  metric  in  (7).  In  addition,  because  of  image  noise, 
the  true  motion  vector  may  even  not  have  the  minimum  distance.  To  deal  with  this  problem,  we  propose  a 
reliability-based  motion  field  analysis  scheme  as  explained  in  the  following. 
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For  each  structural  block  Bm,  we  find  the  top  L  best  matches  for  Bm  in  the  previous  frame  It-\  and 
the  estimation  results  are  denoted  by  A  =  {(V™,  d™) |1  <  j  <  L},  where  V”1  =  ( x y'j1)  represents  the 
motion  vector  and  d.J'  is  the  corresponding  distance.  Based  on  the  data  set  { ( V"\ dl-!)\\  <  j  <  L},  we 
extract  a  representative  motion  vector,  denoted  by  Vm,  and  define  a  reliability  measure  7"1.  We  assume 
Vm  is  the  “true”  motion  vector,  which  could  be  wrong.  Our  basic  idea  is  as  follows:  if  we  choose  Vm  as 
the  representative  motion  vector,  there  will  be  a  number  of  other  motion  vectors  in  the  set  whose  distance 
measurements  are  also  very  close  to  the  one  of  Vm.  Certainly,  the  larger  the  number  is,  the  more  uncertainty 
we  have,  and  the  less  reliability  the  motion  estimation  is.  Let 

dm  =  min  d f ,  c r  =  max  d ™ .  (8) 

j  J  j  J 

Let 

dtf  =  dr  +  a-(d%-d™),  (9) 

where  a  is  a  threshold  value  between  0  and  1.  By  default,  we  set  a  =  0.1.  The  physical  meaning  of  a  =  0.1 
is  noise  level.  We  pick  out  a  subset  of  those  motion  vectors  in  A  whose  distance  measurements  are  very 
close  to  the  minimum  d and  denote  this  subset  by 


A_  =  {(V£\OIC<d- 


(10) 


Here,  we  re-lable  the  elements  in  the  set  A_  by  index  k,  1  <  k  <  Km  <  L.  We  choose  the  mean  of  those 
motion  vectors  as  their  representative 


Vm 


Tv  m 


(ID 


We  define  the  reliability  measure  as 


Tv  m 

1+  E  l|v^-  V”*||2 

fc=i 

Here,  0  <  7”1  <  1.  If  a  motion  search  is  reliable,  either  the  value  of  Km  will  be  small  (close  to  one  which 
implies  a  single  minimum)  or  the  motion  vectors  V™  will  be  very  close  to  each  other.  In  this  case,  the 
corresponding  reliability  measure  7"'  will  be  very  close  to  1 . 

In  camera  motion  parameter  estimation  as  discussed  in  the  next  section,  the  reliability  measure  7™  will 
act  as  a  weighting  factor.  Those  motion  vectors  with  lower  reliability,  i.e.,  higher  uncertainty,  will  have  less 
influence  when  determining  the  camera  motion. 

Ill  Camera  Motion  Parameter  Estimation 

From  the  motion  field  analysis,  we  have  obtained  a  representative  motion  vector  Vm  (assumed  to  be  the 
true  motion)  and  an  associated  reliability  measure  7™  for  each  structural  block  Bm,  1  <  m  <  M.  (An 
example  value  of  M  is  60.)  Based  on  this  data  set,  we  are  going  to  estimate  the  camera  motion  parameters. 

The  camera  view  geometry  is  analyzed  in  Dr.  Iyer’s  report.  Let  (X,  Y.  Z)  be  the  coordinate  of  the  object. 
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Let  (x,  y)  be  the  pixel  coordinate  in  the  image,  as  illustrated  in  Fig.  1 .  We  have 


x  =  -^-{x  +  -^)  +  Vtxxy  -  fiy(l  +x2)  +  nzy  (13) 

Z  Vbi 

y  =  ^r(y-^r-)  +  ttx(i  +  y2)-ttYxy-ttzx  (14) 

Z  Vbi 

where  FI  =  (fix ,  fiy ,  Hz)  is  the  angular  velocity;  (x.  y)  is  the  motion  vector  of  the  object  which  is  obtained 
from  the  motion  held  analysis.  The  unknow  variables  are  (fix,  fiy,  fi z)  and  the  range  Z  for  each  pixel 

(x,y). 

III-A  Estimate  the  Angular  Velocity 

The  angular  velocity,  denoted  by  fi  =  (fix,  fiy,  fi z)  plays  an  important  role  in  UAV  navigation  and 
obstacle  avoidance.  From  other  sensors,  such  as  gyroscope,  we  can  get  a  rough  estimation  of  body  orienta¬ 
tion,  and  the  angular  velocity  fi  can  be  roughly  estimated  by  taking  the  difference  between  body  orientation 

measurements  at  two  time  instances.  Vision  information,  as  another  important  source  of  information,  is  able 
to  help  us  refine  the  estimation  of  the  angular  velocity  fi.  Combining  Eqs.  (13)  and  (14)  and  getting  rid  of 
the  pixel-dependent  variable  Z,  we  have 

Dl(x,y)x  -  D2(x,y)y  =  Ci(x,y)Slx  +  Ci(x,y)fiy  +  C3(x,y)QZ-  (15) 


where 


Di(x,y) 
D2{x,y ) 
Ci{x,y) 
Ci(x,y) 
C3{x,y ) 


f  _Vb2 , 

{y  vbJ’ 

/  .  H3  X 

xy(y  -  -  (i  +  y2){x  +  ^), 


-(1  +  x2)(y  -  ^)  +  xy(x  +  ^), 
y(y-^)+x(x  +  ^-). 


(16) 

(17) 

(18) 

(19) 

(20) 


The  unknown  variables  (fix,fiy,fiz)  can  be  obtained  with  Least  Mean  Square  Error  (LMSE)  fitting 
weighted  by  the  reliability.  Let  (xm,  //"  )  be  the  pixel  coordinate  of  the  center  of  block  Bm.  From  the 
motion  held  analysis,  we  have  obtained  the  motion  vector  (xm,ym)  for  this  pixel  and  the  associated  relia¬ 
bility  measure  7”1.  The  weight  LMS  can  be  written  into  a  matrix  form  as 


TAfi  =  Tb, 


(21) 


where 


A  = 


Ci(xl,yl) 

Ci(x2,y2) 


C2{x1,y1) 

C2{x2,y2) 


C^x1^1) 

C3(x2,y2) 


Ci(xM,yM)  C2(xM,yM)  C3(xM,yM) 


(22) 
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and 


Diix1^1)^  +  D2(x1,y1)y1 

"  71 

b  = 

Di{x2,y2)x2  +  D2(x2,y2)y2 

,  r  = 

Di(xm,  yM)xM  +  D2{xm ,  yM)yM 

ryM 

The  solution  is  given  by 

n  =  [(rA)i(rA)]^1(rA)t(rb). 


(23) 


(24) 


III-B  Range  Estimation 

Once  the  camera  motion  is  known,  we  are  able  to  determine  the  range  (or  depth)  Z  for  each  block 
in  the  scene.  As  discussed  in  Section  II,  the  image  is  partitioned  into  blocks,  B”,  1  <  n  <  N.  Let 
A  =  {(x‘j,y‘j) |1  <  j  <  L\  be  the  top  candidate  motion  vectors.  If  the  block,  which  corresponds  to 
an  object  in  the  scene,  is  stationary,  the  true  motion  vector  must  satisfy  Eqs.  (13)  and  (14).  Denote  the 
right-hand  sides  of  (13)  and  (14)  by 

f(x,y,Z)  =  ^-(x  +  ^-)  +  ttxxy-ttY{l+x‘2)  +  ttzy,  (25) 

z  Vbl 

g(x,y,Z )  =  ^-(y-  ^)  +  Dx(l  +  y2)  -VlYxy  -Vlzx.  (26) 

z  Vbl 

If  the  (Xj'.y’-')  is  the  true  motion,  then  the  range  of  this  block  can  be  determined  by  least  mean  squared  error 
estimation 

ZJ  =  arg  min[x”  -  f(x,  y,  Z)}2  +  [yj  -  g(x,  y,  Z)}2 .  (27) 

z 

The  corresponding  fitting  error  is  denoted  by 

Ej  =  [x]  -  f(x,  y,  Zi)}2  +  [y?  -  g{x,  y,  &)]2.  (28) 

Certainly,  the  true  motion  must  have  the  minimum  fitting  error.  Let 

j*  =  arg  min  £4 .  (29) 
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The  range  of  the  block  is  given  by  Z^* ,  and  the  associated  motion  vector  is  (i”» ,  yr-r ) . 

IV  Experimentation  and  Performance  Analysis 

The  proposed  motion  field  estimation,  reliability  analysis,  camera  motion  estimation,  and  range  map 
computation  have  been  implemented  with  C  code.  We  take  two  major  steps  to  evaluate  this  vision  analysis 
system.  First,  we  are  going  to  test  the  system  using  the  multi-UAV  simulator.  The  simulator  is  able  to 
generate  a  video  sequence  for  the  UAV  camera  view  and  the  associated  camera  coordinate  system  and 
orientation.  In  addition,  since  the  simulator  knows  which  point  object  in  the  world  scene  is  mapped  to 
which  pixel  in  a  video  frame,  we  have  the  ground  truth  for  the  motion  fields,  as  well  as  the  range  map. 

In  simulator-based  performance  evaluation,  the  inputs  to  the  vision  analysis  system,  are  the  video  frames, 
the  lineal-  velocity,  as  well  as  the  camera  orientation.  We  add  some  level  of  noise  to  the  camera  orientation  to 
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simulate  the  sensor  measurement  noise.  The  output  of  the  system  will  be  the  angular  velocity  estimation  and 
range  map.  We  are  going  to  check  if  the  vision  information  is  able  to  reduce  the  noise  in  angular  velocity 
and  camera  orientation  measurement.  We  also  compare  the  estimated  range  map  again  the  ground  truth. 
In  the  second  stage  of  evaluation,  we  will  use  the  flight  test  data.  The  flight  data  has  video  from  the  UAV 
camera,  its  GPS  location,  and  orientation  (with  noise).  It  also  has  the  GPS  location  of  the  targets.  The  vision 
analysis  system  will  estimate  the  range  of  the  targets,  and  we  will  compare  the  estimate  again  the  actual 
measurement.  In  addition  to  the  performance  evaluation,  we  will  also  analyze  the  impact  of  noise  in  linear 
velocity  on  the  estimation  accuracy. 

V  Concluding  Remarks 

We  propose  a  hierachical  framework  to  deal  with  uncertainty  and  noise  in  motion  field  analysis,  so  as 
to  develop  a  low-complexity  and  reliable  vision  analysis  system  for  UAV  navigation.  First,  we  classify  the 
image  data  into  structural  and  non-structural,  and  only  use  the  reliable  motion  information  from  structural 
blocks  for  camera  motion  estimation.  Second,  we  introduce  reliability  analysis  into  motion  field  estimation 
and  let  those  motion  vectors  with  higher  reliability  plays  an  influential  role  in  camera  motion  estimation.  In 
this  way,  even  if  the  local  motion  estimation  could  be  wrong  inside  some  image  regions,  the  overall  camera 
motion  estimation  is  still  accurate  and  robust  due  to  those  highly  reliable  structural  blocks.  Third,  we  use 
the  accurate  estimation  of  camera  motion  to  constrain  the  motion  search  for  non-structural  blocks,  and  this 
reduces  the  uncertainty,  as  well  as  computational  complexity  significantly. 
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